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We review recent developments in the measurement of the dynamics of the response properties 
of auditory cortical neurons to broadband sounds, which is closely related to the perception of 
timbre. The emphasis is on a method that characterizes the spectro-temporal properties of 
single neurons to dynamic, broadband sounds, akin to the drifting gratings used in vision. 
The method treats the spectral and temporal aspects of the response on an equal footing. 
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1 Introduction 



1.1 Timbre 

We classify everyday natural sounds by their loudness (related to the intensity of the sound), 
their pitch (the perceived tonal height) and their timbre (the quality of the sound; that which is 
neither loudness nor pitch). The perception of timbre, which will be the main focus of this paper, 
is what allows us to tell the difference between two vowels spoken with the same pitch, or the 
difference between a clarinet and an oboe playing the same note. When hearing several musical 
instruments simultaneously, we can usually tell which instruments are playing by identifying 
the different timbres present in the mixed sound. Additionally, the perception of timbre is 
quite robust in the presence of noise and echoes (or reverberations), or even severe degradation 
such as during a telephone conversation, in which the sound is severely band-passed. Timbre 
perception is therefore an essential attribute of our sense of hearing. 

To understand how we extract these different aspects of a sound, we must unravel what 
the auditory representation is along the neural pathway. The approach presented here takes 
the point of view that the principles used by neural systems are universal, once the stimulus 
has reached beyond the sensory epithelium (whether the cochlea's basilar membrane or the 
retina). In particular the ideas presented here are frequently guided by considering the basilar 
membrane as a spatial axis, analogous to a one-dimensional retina, and then using the methods 
of visual gratings (drifting and otherwise), to study and characterize cells in the auditory cortex. 



1.2 Auditory Cortex 

A few general organizational features have long been recognized in Primary Auditory Cortex 
(AI), the location of which is shown in Figure |I[ First is a spatially ordered tonotopic axis, 
along which cell responses are tuned from low to high frequencies^ ; this is alternatively called a 
cochleotopic axis, which reflects the activity along the cochlea. Note that there are many fields 
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Figure 1: The position of the Primary Auditory Cortex (Al) in the ferret brain. The location of the 
Anterior Auditory Field (AAF) for illustration purposes. On the right the tonotopic axis is overlaid 
for both Al and AAF. 

in the auditory cortical area (the Anterior Auditory Field is shown in Figure 0), most of which 
display a tonotopic organization. 

Second, perpendicular to the tonotopic axis, cells are arranged in alternating bands accord- 
ing to binaural properties: bands of cells are alternatively excited or inhibited by stimulation of 
the ipsilateral ear (the contralateral ear usually produces an excitatory response^). The tono- 
topic and binaural dominance organization is analogous to the retinotopic and ocular dominance 
columns of visual cortex. Other parameters have also been used to describe characteristics that 
change systematically along isofrequency lines. Using combinations of two pure tones, one can 
measure the Response Area (RA), also known as frequency-threshold curve, i.e. the response 
threshold of a cell as a function of the tone frequency presented. It has been shown that most 
RAs are topographically organized along the isofrequency lines according to the symmetry of 
their excitatory and inhibitory sidebands^. Other parameters have been also been shown to 
change systematically in cat, such as threshold^, bandwidth^ and frequency modulation direc- 
tion selectivity^'^. 

These properties of Al cells are derived using pure tones (or clicks) akin to using dots of 
light (or flashes) to study cells in the visual pathway. Below we explain how to use the auditory 
version of drifting gratings^ to characterize response properties of cells to dynamic broadband 
sounds. This is necessary to gain insight to how timbre is encoded. Another advantage of the 
method presented here is that it allows us to determine the temporal and spectral properties of 
a cell at the same time. In particular, one can study whether and to what extent the response 
field varies as a function of time, thereby characterizing the cell with a full spectro-temporal 
response field. 

2 Background 
2.1 Response Field 

Traditionally, cells along the auditory pathway have been characterized by their RA, or tuning 
curve. Determined using pure tones and by modifying the frequency of the stimulus while 
adjusting its intensity, the RA is the frequency-intensity combinations that elicit a threshold 
response, whether the sustained activity level or the strength of the onset response. In this 
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.5[0] 1[1] 2 [2] 4 [3] 8 [4] 16 [5] 

Frequency (kHz) [Octaves] 

Figure 2: Two idealized RFs at a given time. One RF (unbroken line) is centered on low frequencies 
and is asymmetric, and the other (broken line) is centered on high frequencies and is symmetric. 




12 3 4 

Frequency (Hz) 



Figure 3: The spectrum of /aa/ spoken by one of the authors, with the spectral envelope superim- 
posed on it. 

paper, we use the Response Field (RF), a function measured using broadband sounds. As 
illustrated in Figure ^, it roughly reflects the range of frequencies that influence the discharge 
properties of the neuron under study. It is given in the form of a function, with positive 
values describing excitation (proportional to the RF's amplitude) and negative values describing 
inhibition. In general, the RF is a spectro-temporal function, as opposed to the RA which 
typically describes only static properties (but see Nelken^ et al and Sutter^ et al). The definition 
of RF will be made more precise later. 

2.2 Natural Sounds 

Natural sounds, such as environmental sounds, music and speech, are classified along several 
perceptual axes. We typically describe a sound by its loudness, its pitch and its timbre. Pitch 
is what changes when we pronounce the same vowel with different tonal heights, e.g. the pitch 
of a female voice is typically higher than the pitch of a male voice. Timbre is what changes 
when, keeping the same tonal height, we pronounce different vowels (e.g. /ah/, /eh/, /ih/). 
Figure || illustrates the spectral profile or envelope of a sound. The envelope of a sound can be 
viewed as a low-order polynomial fit of the (time- windowed) spectrum of the sound. A common 
method for the extraction of the envelope is the Linear Predictive Method (LPC)^°; we will not 
go into the details of LPC here, instead referring the reader to the intuitive notion of envelope 
illustrated in Figure 

The percept of timbre has been typically ascribed to the extraction of the envelope of the 
spectrum, but it also includes the temporal variations in the spectral envelope (for instance. 
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the sound of a piano note played backwards sounds more like that of a wind organ, even 
though the amplitude of the Fourier transform of a sound and its time-reversed version are 
identical). Therefore, the study of how timbre is encoded must include temporal as well as 
spectral properties of the system. For speech, the temporal variations in timbre involve time- 
scales of about 10 Hz, so that this dimension of time is different from the temporal frequencies 
that make up sounds. It is the extraction of the dynamic spectral envelope by the auditory 
cortex that we are concerned with. Because we are interested in timbre, we use pitchless, 
dynamic, broadband sounds as stimuli. 

2.3 Auditory Pathway (Monaural) 

The auditory pathway up to primary auditory cortex, ignoring structures usually considered 
dedicated to binaural aspects of sounds (such as localization) can be minimally described as 
follows. The vibrations of the tympanic membrane are mechanically transformed into a traveling 
wave in the cochlea, with a profile that depends on the frequency content of the acoustic 
spectrum. The vibrations of the basilar membrane are transformed by inner hair cells into 
patterns of neural activity in the auditory nerve. For practical purposes, we can think of the 
basilar membrane as a collection of 1/3 octave filters, performing a time-windowed Fourier 
transform, with a time characteristic of about 30 ms. The auditory nerve projects to the 
Cochlear Nucleus, which contains a variety of cells with different properties. These cells project 
to the Lateral Lemniscus, then to the Inferior CoUiculus, then to the Medial Geniculate Body 
in the Thalamus, and finally to the Auditory Cortex. As with all other sensory modalities, 
there are strong back projections for most forward projections. 

Neurons at difi^erent stages of the auditory pathway respond to different time-scales. Neurons 
in the mammalian auditory nerve phase- lock to a pure tone up to frequencies of about 4 kHz: 
that is, they tend to fire at a specific phase of the tonal input, even if they fire in a sustained 
fashion at the maximum rate of about 200 spikes/second.^^ In the cochlear nucleus certain cells 
(so-called lockers) can phase- lock to tones for frequencies up to about 2 kHz.^^'^^ By the Inferior 
CoUiculus, most cells phase-lock to variations in the stimulus up to about 200 Hz with some 
cells going up to 800 Hz.^^'^^ Finally, at the level of cortex, we have found that phase-locking 
to variations in the stimulus is usually on the order of 10 Hz with a maximum of about 70 
Hz.-*^^ Characterizing single units and their temporal features may ignore other potential coding 
strategies based on population activity. In the cat's cochlea, 3000 inner hair cells innervate 
50,000 auditory fibers, and by the auditory cortex, activity has been distributed over several 
millions of neurons. 

Another important aspect of the organization of the auditory pathway is that cells tend 
to be organized in a tonotopic manner at each step: the frequency decomposition performed 
by the basilar membrane is along an axis which is logarithmic. Up through AI, cells that are 
equally spaced along a certain axis (which depends on the structure) respond best to sounds 
that are linearly spaced on a logarithmic frequency axis. 



3 Principles 
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3.1 Guiding Principles 

The guiding principle behind our research program is that cells behave like a linear system with 
respect to the spectral envelope. The proof of linearity is that when cells are presented with a 
sound made of up the sum of several spectral envelopes, the response, as measured assuming 
a rate code, is the sum of the responses to the individual envelopes. A response linear in 
frequency and time is characterized by a two-dimensional impulse response (or time-dependent 
response field) or equivalently, its Fourier transform, a two-dimensional transfer function. The 
extraction of this two-dimensional response field, a function of frequency and time, is the object 
of this paper. 

It is helpful to remember that because the cochlea performs in some sense a time-windowed 
Fourier transform of the incoming waveform along its length, it is constructive to treat the 
frequency axis as a spatial axis, not the Fourier transform of the time axis. Since the frequencies 
are mapped logarithmically along the cochlear axis, the natural unit along the spectral axis is 
X = log(/). Much research on which the present work is based has dealt with the spectral, 
time-independent aspect of the response fields and linearity. 

3.2 Response Field and Linearity 

Initially ignoring the dimension of time (or taking a delta function for the temporal impulse), 
the response of a cell with a response field RF{x), to a sound with a spectral envelope S{x), 
is given hj y = J S{x) ■ RF{x)dx.f\ Incorporating time (or allowing for more realistic temporal 
Impulse Response functions), we first limit our study to the case in which the temporal and 
spectral properties that characterize the cells' responses are independent one from the other 
(separable). The response of a cell is then characterized by two functions, RF{x), which 
describes the spectral properties, and IR(t), which describes the temporal properties of the 
cell. Then, the response of a cell is described by y{t) = (/ S{x,t) ■ RF{x) dx) * IR{t) where * 
is the convolution operator. We will see that we can characterize certain cells in this way. 

In the general situation, cells must be characterized by a full spectro-temporal description, 
i.e. a Spectro- Temporal Response Field, STRF{x,t). In this case the response is given by 
y{t) = J S{x,t) *t STRF{x,t) dx, where the *t means convolution in the t direction (with 
multiplication in the x direction). 

In the following, it is useful to consider the Fourier transform of the two-dimensional im- 
pulse response function, STRF{—x,t), called the transfer function, T{Q,w), where we define 
T (f2, w) = JFq ^ [STRF{—x, t)]. The coordinate dual to x is fl, and the coordinate dual to t is 



^This is the standard convention used in hearing and vision in defining the Response Field; it is related to 
the Spectral Impulse Response function, which is RF{—x). 

^The coordinate dual to t is w, not f. This is because the spectro-temporal representation we are using is 
inspired by the cochlea's time-windowed Fourier transform on the original (acoustic) input signal. The time 
coordinate t used at higher levels in the auditory pathway is much coarser than the acoustic time, roughly 
corresponding to a labelling of "which" cochlear time- window is being referred to. 



8 




200 400 600 800 1000 



Time (ms) 



Figure 4: Spectrotemporal envelope of a ripple, moving downward in frequency with w = 3 Hz and 
Q = 0.6 cycles/octave. 



3.3 Spectro- Temporal Response Field 

Our general problem can be formulated as follows: S{x, t) is the spectro-temporal envelope of 
the sound. Given the STRF{x,t) of a neuron, we can measure its response to any S{x,t). 
We obtain this STRF from measurements of the neuron's response to a complete set of basis 
functions S^^ {x,t). A simple set of basis functions is Snw{x,t) = sin27r(r2 ■ x + w ■ t) where 
S — corresponds to a fiat envelope of fixed loudness (i.e. noise). Any orthogonal basis will 
do, but the use of a sinusoidal basis allows us to use the standard methods of Fourier analysis. 
Furthermore, because of non-linearities discussed below, the sinusoidal basis is robust against 
distortion. We use the sinusoidal basis functions, and call them 'ripples'. For this reason Q is 
called ripple frequency (in cycles/octave) and w is called ripple velocity (in cycles/second, or 
Hertz). 

The most prominent non-linear distortions are half-wave rectification and compression. The 
half- wave rectification is due to the impossibility of negative spike rates (assuming the steady- 
state response to a fiat spectrum to be zero, as will be seen to be the case); the distortion of a 
sinusoid due to firing rate half-wave rectification does not affect the phase of the response, and 
its effect on the amplitude of the first Fourier component is a constant factor (independent of 
Q and w) . The distortion due to compression does not affect the phase of the response. 

3.4 Transfer Functions 

By measuring the response ynw (t) of a cell to a ripple of specific ripple frequency Q and ripple 
velocity w, we can obtain the transfer function T (Q, w) at one point in D, — w space. 



(t) = JJ dx'dt' STRF {x\ t') sin 27r {VLx' + w{t- t')) 

= ^ jj dx'dt' ^Ti?F(x',t')e''"^''"'^"'^*"*'" 
= Q^jnwt jj ^^/^^/ STRF{x',t'y^''^^'''-'"''^ 
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Figure 5: The Vt - w plane. The value of the transfer function at a point in quadrant 1 is the 
complex conjugate of the value at the corresponding reflected point in quadrant 3 (and similarly for 
the quadrant pair 2 &i 4). The ripple in Figure ^ corresponds to a pair of points in quadrants 1 and 
3. 
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(1) 



In this way, we derive the amphtude |T w) \ and phase $ {Q, w) of the complex transfer 
function T{Q, w) by measuring the amphtude and phase of the (real) response of the cell. By 
the definition of the transfer function, it follows that the inverse Fourier transform of T{Q, w) 
is the STRF of the cell: STRF{x,t) = J^-^.^^t [T^w]- 

Because STRF{x, t) is real, but T{Q, w) is complex, there is a complex conjugate symmetry. 



-w 



T{n,w) = T* {-n, 

which holds for the Fourier transform of any real function of x and t. 



(2) 



3.5 Full Separability 

Many cells possess transfer functions that are fully separable, i.e. the ripple transfer function 
factorizes into a function of Q and a function of w over all quadrants: T{Q, w) = F{Q) ■ G{w). 
This implies that STRF{x, t) is spectrum-time separable: STRF{x, t) = RF{x) ■ IR{t). In this 
case, we only need to measure the transfer function for all Q at an arbitrary w, and for all w at 
an arbitrary Q. Then F[Q) and G{w) are each complex-conjugate symmetric (because RF{x) 
and IR{t) are real), and we need only consider the positive values of each. This dramatically 
decreases the number of measurements needed to characterize the STRF. 
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3.6 Quadrant Separability 

For cells that are not fully separable, we have found that they are still quadrant separable/^ 
i.e. the transfer function T{Q, w) can be written as the product of two independent functions: 



where the subscript 1 indicates the > 0,w > quadrant, and the subscript 2 the f2 < 0, w > 
quadrant. Note that by reality of the STRF, the transfer function in quadrants 3 {fl < 0,w < 0) 
and 4 is complex conjugate to quadrants 1 and 2 respectively. In this case, the STRF is not 
separable in spectrum and time, but is the linear superposition of two functions, one with 
support only in quadrant 1 (and 3), and one with support only in quadrant 2 (and 4). 

3.7 Confirming Separability 

Separability is measured by comparing the measured transfer function taken along parallel 
lines of constant Q or constant w. If the sections of the transfer function differ only by a 
constant amplitude and phase factor, then that section is independent of the perpendicular 
variable and therefore the transfer function is separable. If in addition the section of the 
transfer function is complex-conjugate symmetric about zero, then the transfer function is fully 
separable. Otherwise the transfer function is merely quadrant-separable. 

3.8 Confirming Linearity 

The method we use to characterize cortical cells depends on their being linear, so linearity must 
be assessed. To this end, we measure (as described above) the transfer function of a cell with 
single ripples, and then measure the extent to which we can predict the response of the cell to 
a linear combination of ripples. Confirmation of linearity comes from measuring the response 
of the cell to linear combinations of ripples, thereby verifying the degree of linearity of the 
response. 

Predicting the response of the cell to linear combinations of ripples for which the transfer 
function was not measured directly, but only inferred via separability, verifies both linearity 
and separability simultaneously. 

3.9 Characterizing the Response 

The functions F{Q) and G{w) are unconstrained theoretically. Physiologically, however, there 
are constraints on the type of functions they may be. For instance, because F{Q) is the Fourier 
transform of RF{x) which is localized around a center frequency [fm in frequency space, 
in logarithmic frequency space), the phases of F{Q) must constructively interfere at Xm, and 
the amplitude of F{Q) must be band limited. See, e.g. Figure for examples of RFs, each of 
which is band limited and centered at a different Xm- 




n>o,w>o 
n<o,w>o 



(3) 
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3.9.1 Amplitude of the response 

The amphtude of the ripple frequency transfer function F{Q) reaches a maximum at Qm ~ 
{2BW)~^, where BW is the excitatory bandwidth of the RF in octaves, and then decreases: 
at higher ripple frequencies the modulations of the ripple's spectral envelope cancel when inte- 
grated against the (more slowly varying) RF; at ripple frequencies lower than Q^, the energy 
in the ripple's spectrum is fairly constant over the width of the RF, including any negative 
sidebands, and therefore integrates to a smaller magnitude. Similarly, the amplitude of the 
ripple velocity transfer function G(w) has a maximum at Wm ^ {2BWt) , where BWt is the 
temporal excitatory width of the IR. Because under anesthesia the steady state response to any 
sound with a constant envelope has a rate of zero in cortex, we get G{0) = J dtIR{t) = 0. 

3.9.2 Phase of the response 

Because neurons in the auditory pathway are tonotopically arranged, each cell has a frequency 
around which the RF is centered which is independent of the ripple frequency Q. Since the 
derivative of the phase of F{Q) gives the mean frequency of the response for that ripple fre- 
quency, the phase of the transfer function is linear (plus a constant)^]. Similarly, because IR is 
causal, there is a group delay, and because of the biological nature of the neural process, the 
delay is roughly independent of ripple velocity, which gives a constant derivative of the phase 
of G{w). 

Therefore the phase of the transfer function ^'^{Q,w) (see Equation (1)), q = {1,2} (for 
each quadrant), can be written as {Q,w) = 27iQx^ + 2t[wtI + x"^, where = log/^ is 
the mean frequency around which the RF is centered, and is the delay of the IR, defined as 
the mean of the envelope of the IR.^J x'^ is a constant phase angle. Tonotopy guarantees that 

~ but depending on the precise inputs of the neuron, they may not agree completely, 
so that we can have different Xm for upward and downward moving sounds. Similarly, t\ ~ rj, 
but equality is not required. The reality of the response enforces complex-conjugate symmetry 
of the transfer functions, allowing for these six independent parameters to describe the phase 

everywhere in the VL w plane. A convenient convention is to define constant phase angles 

9 and (f) such that = ^ + 0? = ^ ~ 0- With the complex-conjugate symmetry, and if the 
STRF is separable, is the symmetry parameter of the RF and 6 is the symmetry parameter 
of the IR (in Figure 0, = 90° for the left cell and = 0° for the right cell). Even in the 
non-separable case, we will still call the RF symmetry and 9 the IR polarity. If one restricts 
measurements to one quadrant plus the w7-axis (recall from above that the transfer function 
vanishes on the f2-axis), one can measure x that quadrant and, on the axis, the average of 

and x^i i-e- d. There is an ambiguity in fixing and that allows us to restrict 9 to lie 
between 0° and 180°, while ranges the full -180° to +180°. 

The phase curve does not truly have a discontinuity across the axis. For very small ripple 
frequencies, the response becomes more independent of the best frequency of the cell, allowing 
the slope to change continuously from its constant value to 9. At large ripple frequency the 

^This is completely analogous to the derivative of the phase of the Fourier transform of a signal, d(j)/dw, 
giving the characteristic delay (for that frequency) or the derivative of the angular frequency of a dispersion 
relation, dw/dk, giving the group velocity (for that wave number). See, e.g. Papoulis^^ and Cohen^". 

■^The envelope E{t) of a function with localized support can be defined as the modulus of the function plus j 
times its Hilbert transform. The mean of the envelope is then computed as (i) = / dttEit)^. See, e.g. Cohen^°. 
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Figure 6: The phase of the transfer function can be described by 6 parameters over most of the 
relevant regions of the Vt - w plane. 



Figure 7: Phase Curves. The slope is constant for most of the curves, after (left) IhwtI has been 
removed from the corresponding quadrants, corresponding to a center frequency that is independent 
of the ripple frequency, and (right) after 2T^Vtx]^ has been removed, corresponding to a delay that 
is independent of ripple velocity. At very small ripple frequencies (long ripple periodicity), center 
frequency is less meaningful, and similarly for small ripple velocity and delay, respectively. At 
large ripple velocity the slope asymptotes to the signal-front delay, but when this occurs the small 
amplitude of the transfer function makes it difficult to measure the phase (see Dong and Atick^^ 
and Papoulis^^). 

slope may also diverge from its constant value, but at these ripple frequencies the amplitude is 
small and so the particular values of the phase do not contribute. Similarly, the phase of G{w) 
is constant over its intermediate range but changes continuously to on the Q-axis. Since the 
amplitude is zero on that axis, this is not so important. 




4 Analytical Methods 
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Octaves 

Figure 8: Left: A time slice of tlie stimulus: 101 tones equally spaced along the logarithmic axis. 
This ripple has a ripple frequency Q of 0.4 cyc/oct with zero phase, and a linear modulation of 
50%, against an arbitrary intensity axis (see Equation (4)). Right: the spectral profile changes as a 
function of time, giving a moving ripple, here with positive frequency (since the phase increases as 
a function of time). 

4.1 The Ripple Stimulus 

The auditory stimulus wc use has a sinusoidal profile at any instant in time. Since it would 
be hard to generate noise and then shape it with filters, we generate ripples over a range of 
5 octaves by taking 101 tones with logarithmically spaced (temporal) frequencies and random 
(temporal) phases. The amplitude S{x,t) of each tone of frequency /, with x = log2(//o), /o 
the lower edge of the spectrum, is then adjusted as 

S{x, t) = L{1 + AA- sin (2% (n ■ x + w ■ t) + $)) , (4) 

for a linear modulation. L is the overall base of the stimulus and is adjusted to a level typically 
10-15 dB above the lower threshold of the cell as determined with pure tones at the tonal best 
frequency. The overall level of a single-ripple stimulus is calculated from the level of its single 
frequency components: thus, a flat ripple of level Li is composed of 101 components, each at 
Li - 10 log (101) ^ Li - 20 dB. 

Five parameters are sufficient to characterize the ripple stimulus: 

• The ripple frequency fl in cycles/octave, 

• The ripple velocity w in Hz, so that a positive value of w and fl corresponds to a ripple 
whose envelope travels towards the low frequencies 

• The level or base loudness of the ripple, 

• The amplitude of the modulation of the ripple around the base, 

• The ripple's initial phase. 

Since the tones that make up a ripple are logarithmically spaced, its pitch is indeterminate. 
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Figure 9: The top panel represents the spectral envelope of the stimulus at a given instant against 
an arbitrary intensity axis. For the two cells represented in the middle panel (with IR(t) = 5 (t)), 
one (unbroken) with the RF centered on low frequencies (x^ = 1, asymmetric with = 90°), and 
the other (broken) with the RF centered on high frequencies {xm = 4, symmetric with = 0°), the 
expected responses to a 4 Hz ripple is shown in the bottom panel (unbroken and broken, respec- 
tively), against some measure of the response, for instance spikes/sec or the intracellular potential. 
In our case, the actual response is half-wave rectified, and measured in the form of a spike count, 
so that the bottom panel should really be seen as a spiking probability that can be measured by 
measuring the response of the cell to many presentations of the same stimulus. 



4.2 Data Analysis 

In this section, we show the data analysis we apply with the help of a simulation, but to keep 
the graphs one- dimensional we assume that in Figure |^ and Figure |10|, IR (t) = 6 {t). 

We use two paradigms to obtain the transfer function of a cell. First, we choose a ripple 
frequency and present the cell with ripples of varying ripple velocities (typically, -24 Hz to 24 
Hz in cortex). Then, for a fixed ripple velocity, we present the cell with ripples of varying ripple 
frequencies (typically, from -1.6 to 1.6 cyc/oct). 

As indicated for a 4 Hz ripple in Figure |^, the response of a cell as a function of time is 
modulated at the same (temporal) frequency as that of the stimulus. Therefore, we just have 
to extract the phase and the amplitude of the response. The resulting transfer function for the 
same two cells is shown in Figure |10|. We have presented ripples to the idealized cells shown in 
panel B. The amplitude of the response as a function of ripple frequency is shown in panel C, 
whereas the phase of the response is shown in the bottom panel. Note that the phase intercept 
is 0° for the symmetric cells and 90° for the antisymmetric cell. 

In the corresponding Q w space, the ripple of Figure || corresponds to a pair of points. 

Therefore, to measure the complete ripple response transfer function of a cell we need to measure 



its response to all possible ripples, as shown in Figure 11. Note that since cells in cortex respond 
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0.8 1.2 1.6 2 

Cycles/octave 

Figure 10: The sounds with the spectrum shown in A (ripples with ripples frequencies of (flat 
spectrum), 0.4 and 0.8 cycles/octave) are presented at various phases to the two cells in B, as 
in Figure |. The amplitude (for instance in spikes/sec) (C) and phase (D) of the best fit to the 
response are shown. 

only to transient stimuli, it is not necessary to present the stimuli along the w = axis. 



4.3 Separability 

We have shown previously^^'^^ that within each quadrant, actual ripple transfer functions are 
separable 0: for two fixed values of fl, the transfer function as a function of w only changes 
between the two by an overall scale factor and an overall phase. The same is true when Q and 
w are reversed. Hence, one is required only to study two lines in Q — w space. Therefore we 



only need to sample a line in each direction within each quadrant, as shown in Figure 

Without separability, whether full or quadrant, it would be extremely difficult to characterize 
a cell by its transfer function. Experimentally, given the time required to measure one point 
of the transfer function, measuring the transfer function at the points indicated in Figure ^ is 



feasible, whereas measuring the transfer function at all the points indicated in Figure 11 is not. 



^Strictly speaking, we have shown it only for the first quadrant, i.e. for down-moving ripples. 
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Figure 11: To measure the complete ripple transfer function, we have to measure the response 
of the cell to all the ripples represented by large circles above. The smallest circles correspond to 
redundant ripples, as inspection of Eq. (2) and Figure || shows. 



Figure 12: Since we found experimentally that cells have separable transfer functions within each 
quadrant, it is enough to measure the transfer function along two orthogonal lines in each quadrant. 



4.4 Linearity 

Linearity is confirmed by comparing the response to combinations of ripples with the response 
predicted by summing the responses to the individual ripples, i.e. the values of the transfer func- 
tion. A combination of ripples is computed such that its base loudness is the same as the individ- 
ual ripples', and the amplitude of the modulation is scaled as in Equation (4). As an example, to 
present the combination of two ripples (whose properties are described by subscripts 1 and 2), we 
compute B = Bi sin {2-n {VLi ■ x + Wi ■ t) + $i) + i?2 sin (277 (0.2 ■ x + W2 ■ t) + $2). For a mod- 
ulation of Ay4, the envelope is (in the manner of Equation (4)) L • (1 + ■ 5/max (-B)), where 
L is the base intensity level. The sound is generated from the envelope using 101 tones over 5 
octaves with logarithmically spaced (temporal) frequencies and random (temporal) phases. 
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5 Experiment and Results 
5.1 Experimental Details 

Data were collected from domestic ferrets (Mustela putorius). The ferrets were anesthetized 
with sodium pentobarbital and anesthesia was maintained throughout the experiment by con- 
tinuous intravenous infusion of either pentobarbital or ketamine and xylazine, with dextrose (in 
Ringer's solution) to maintain metabolic stability. The ectosylvian gyrus, which includes the 
primary auditory cortex, was exposed by craniotomy and the dura reflected. The contralateral 
ear canal (meatus) was exposed and partly resected, and a cone-shaped speculum containing a 
miniature speaker was sutured to the meatal stump. For details on the surgery see Shamma et 



All stimuli were computer synthesized, gated, and then fed through a common equalizer into 
the earphone. Calibration of the sound delivery system (to obtain a flat frequency response up 
to 20 kHz at the level of the eardrum) was performed in situ using a 1/8-in. probe microphone. 

Action potentials from single units were recorded using glass-insulated tungsten micro- 
electrodes with 5-6 MQ tip impedances. Neural signals were fed through a window discriminator 
and the time of spike occurrence relative to stimulus delivery was stored on a computer, which 
also controlled stimulus delivery, and created raster displays of the responses. In each animal, 
electrode penetrations were made orthogonal to the cortical surface. In each penetration, cells 
were typically isolated at depths of 350-600 fim corresponding to cortical layers III and IV^. 

5.2 Obtaining the Transfer Functions 

As explained above, we measure the cells' transfer functions by presenting first, at a fixed ripple 
frequency, ripples of various velocities. Then, for a fixed ripple velocity, we present ripples of 
varying ripple frequencies. 

5.2.1 Spectral cross-section of the transfer function 

A typical example of the analysis is shown in Figure |13|. Ripples were presented at 8 Hz, for 
ripples frequencies from -1.6 cyc/oct to 1.6 cyc/oct in steps of 0.2 cyc/oct, with the ripple 
starting to move at t = 0ms, but being acoustically turned on starting at 50 ms with a linear 
ramping over 8 ms. Each action potential is denoted by a dot on the raster plot in A. One can 
see the onset response to the ripple at about 70 ms (50 ms + delay due to the ramping up of the 
stimulus, -|- latency of the response). Each ripple is presented 15 times. Once the onset activity 
has died away, the cell goes into a sort of steady-state response. For each ripple frequency, we 
compute a period histogram starting at 120 ms (this excludes the onset response). Four of 
those histograms are shown in panel B. To assess the strength and phase of the phase- locked 
response, we divide the histogram into 16 equal bins. The amplitude and phase of the response 
is then evaluated by performing a Fourier transform of the data, and extracting the phase of 
T [Q, w = 8 Hz) from the first component of the Fourier transform, and the amplitude from 



aF. 




AC, {Q)\ 



(5) 
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If the modulation of the response were that of a purely linear system, the higher coefficients 
ACi{Q) would be negligible. But because of the half- wave rectification and other non-linearities, 
they usually are significant. Therefore we weight ACi{fl) by the RMS of the other coefficients 
of the ACi{Q) to assess linearity. 

The magnitude and phase of the transfer function is shown in panel C. In D, we have inverse 
Fourier transformed separately the transfer function in quadrant 1 and 2, or equivalently for 
down- and up-moving ripples, after removing the constant (temporal) phase factor 2'n'WTd + 6, 
where w = 8Hz. In this case, the up- and down-moving RFs match very well with each other 
and with the RF obtained with a two-tone paradigm^. 

Note that the period histograms shown in panel B correspond to periods starting at 120 
ms, so as to eliminate the effect of the onset response, whereas the second graph in panel C 
shows phases sent back to ms, at which point in time the phase of the ripples presented were 
all degrees. 



5.2.2 Temporal Cross-Section of the Transfer Function 

An example of the extraction of the temporal cross-section of the transfer function for the same 
cell as in Figure [1^ is shown in Figure Ripples are presented at 0.4 cyc/oct, for ripple 



velocities from -24 Hz to 24 Hz in steps of 4 Hz, with the ripple starting to move at t = 0ms, 
being acoustically turned on starting at 50 ms with a linear ramping over 8 ms. Each action 
potential is denoted by a dot on the raster plot in A. One can see the onset response to the 
ripple at about 70 ms (50 ms -|- delay due to the ramping up of the stimulus, + latency of 
the response). Each ripple is presented 15 times. Once the onset activity dies away, the cell 
goes into a steady-state response. For each ripple frequency, we compute a period histogram 
starting at 120 ms (so that the onset response is excluded). Four of those histograms are shown 
in panel B. To assess the strength and phase of the phase-locked response, we divide the period 
into 16 equal bins. The amplitude and phase of the response is then evaluated by performing 
a Fourier transform of the data, and extracting the phase of T (f2 = 0.4 cyc/oct,w) from the 
first component of the Fourier transform, and the amplitude from 

T{n = 0.4 cyc/oct, w) = Ad {w) ■ \^£lM}= (6) 

^J:U\Ac^{w)f 

If the modulation of the response were that of a purely linear system, the higher coefficients 
ACi{w) would be negligible. But because of the half- wave rectification and other non-linearities, 
they usually are significant. Therefore we weight ACi{w) by the RMS of the other coefficients 
of ACi{w) to assess linearity. 

The magnitude and phase of the transfer function is shown in panel C. In D, we have inverse 
Fourier transformed separately the transfer function in quadrant 1 and 2, or equivalently for 
down- and up-moving ripples, after removing the constant (spectral) phase factor 2TTflXm + (p, 
where Q = 0.4 cyc/oct. In this case, the up- and down-moving IRs match very well with each 
other. 
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Figure 13: Data analysis using ripples of fixed velocity and varying frequencies. A: Raster plot of 
responses. Each point represents an action potential, and each paradigm is presented 15 times. B: 
Period histogram for 4 ripple frequencies. Note how the position of the peak of the best fit changes 
linearly with ripple frequency. C: Magnitude and phase of the period histogram fits. D: Separate 
inverse Fourier transforms for positive and negative ripple frequencies of C, obtaining a slice of the 
RF. Also given for comparison is the response area as determined by the two-tone paradigm. ^ 
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Figure 14: Data analysis using ripples of fixed frequency and varying velocities. A: Raster plot 
of responses. Each point represents an action potential, and each paradigm is presented 15 times. 
B: Period histogram for 4 ripple velocities. Note how the peak of the best fit changes linearly with 
ripple velocity (the Hz case can be used to estimate noise). C: Magnitude and phase of the period 
histogram fits. D: Separate inverse Fourier transforms for positive and negative ripple velocities of 
C, obtaining a slice of the IR. 
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Figure 15: Left The positive-frequency RF computed at constant ripple velocity for 3 different 
ripple velocities. The shapes should be the same if the system is separable. Right The positive- 
frequency IR at constant ripple frequency for 3 different ripple frequencies. The shapes should be 
the same if the system is separable. 



5.3 Quadrant Separability 



RF{x) and IR{t), as illustrated in panels D of Figure |T^ and Figure |T^, are linear combina- 
tions of the transfer function evaluated along cross-sections of the Q — w plane. Constancy 
of RF{x) computed for different w is equivalent to proportionality of T{Q,w) for different w 
(and similarly for RF{x), Q, and T[Q,w)). This was the requirement given above to verify 
quadrant separability. This has all been verified for many cells in the first quadrant While it 
is theoretically possible for the remaining independent quadrant to be nonseparable, it seems 
unlikely in ferrets, humans, and most mammals (possible exceptions might include sonar-using 
animals, which could require further specialization). We are currently verifying separability in 
the second quadrant. 

Shown in Figure |1^ are examples of the positive-frequency RF and positive-frequency IR 
for two cells, as computed at the different sections indicated. 



5.4 Quadrant Linearity 

Linearity has been verified by presenting cells with a combination of ripples from different 
quadrants^^'^^'^^. As shown in Figure |1^ for one cell, the correlation between the predicted and 
the measured response is (as in most cases) very good. Note that the predicted response is 
shown in its non-half-wave rectified version: as cells do not have negative firing rates, and the 
pentobarbital anesthetic has reduced the spontaneous activity to zero, the comparison should be 
made between the actual response and the half-wave rectified version of the predicted response. 
The correlation coefficient p in Figure |1^ is the cross-correlation between the measured and 
the predicted response. We have previously presented the correlation between prediction and 
response within a single quadrant for 55 cells and found 84% of the cells with p > 0.6.^^ The 
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error bars on the measured response show the variabihty of cortical cells' responses from sweep 
to sweep. Disparity is maximal between the prediction and the actual spike count where both 
are small. 



5.5 Full-Quadrant Separability and Linearity 

The remainder of this discussion describes logical extensions that are currently under study. 
Thus far we have only verified separability in a single quadrant. In vision, some cortical simple 
cells are fully separable^^, but all are at least quadrant separable^^. We have found both types 
in the auditory cortex as well; Figure |1^ shows examples of each. A fully separable cell has an 
STRF that is a simple product of an RF and an IR, as in A. A quadrant separable cell, as in 
B, does not, since it has different responses for upward and downward moving ripples (as can 
be seen by inspection of its STRF{x,t): it is not symmetric about Xm)- The separability of a 
cell does not affect the linearity of responses to ripple combinations. 



5.6 Response Characteristics 

The transfer function for a specific cell is typically tuned to a characteristic ripple frequency 
and velocity. The population of cells shows a wide range of characteristic ripple frequencies and 
velocities. Characteristic ripple velocities are mostly in the 8 - 16 Hz range, rarely exceeding 
30 Hz, and characteristic ripple frequencies are mostly in the 0.4 - 0.8 cycles per octave range, 
rarely exceeding 2 cycles per octave (in this anesthetized preparation). The slope of the transfer 
function as a function of ripple frequency, Xm, corresponds to the center frequency of the spectral 
envelope, which ranges from 200 Hz to at least 24 kHz (above which our acoustic delivery system 
is inadequate). The slope of the transfer function as a function of ripple velocity, r^, corresponds 
to the center of the temporal envelope, which ranges roughly from 10 ms to 60 ms. The RF 
symmetry 0, which describes the effects of lateral inhibition and excitation, ranges roughly 
from —90" to +90° (out of a possible —180° to +180°), clustered around 0°. The IR polarity 
6, which describes the polarity of the temporal response, ranges roughly from 45° to 135° (out 
of a possible 0° to 180°). 



6 Conclusions 

The emphasis in this review has been on presenting a technique to describe neural response 
patterns of units in the cortex. More precisely, we use moving ripples to characterize the 
response fields of auditory cortical neurons, although this is a general method that can be used 
anywhere responses are shown to be substantially linear for broadband stimuli. 

Practically, we find that because of linearity of cortical responses with respect to spectral 
envelope, we can use the ripple method to characterize auditory cortical cell responses to 
dynamic, broadband sounds. The linearity of the cortical unit responses is quantified by the 
correlation coefficient between the predicted and the measured responses curves. While at 
this point we do not have statistics to quantify the linearity of response to ripples moving in 
both directions, linearity within one quadrant (to down-moving ripples) has been extensively 
quantified^^, and we have no reason to expect linearity be any different for ripples moving in 
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Figure 17: Predictions of responses to complex dynamic spectra using the STRF. A The predicted 
response is computed by a convolution (along the time dimension) of the STRF with the spectrogram. 
The stimulus shown is composed of two ripples (0.4 cycles/octave at 12 Hz and -4 Hz). The predicted 
waveform is shown juxtaposed to the actual response (crosses) over one period of the stimulus, in 
spikes/bin summed over 30 sweeps. B Another example: the stimulus consists of a combination 
of ripples with ripple frequencies 0.2 cycles/octave at 4 Hz, 0.4 cycles/octave at 8 Hz, ... 1.2 
cycles/octave at 24 Hz, in cosine phase, resulting in an FM-like stimulus. In this Ketamine/Xylazine 
preparation, the spontaneous activity was non-zero. 



botli directions. Tlie separability of cells makes the ripple method practical, because of the 
time needed to characterize a cell. One advantage of the method is the simultaneous probing 
of spectral and temporal characteristics. Temporal processing is becoming more and more 
recognized as an essential part of cortical function, and the ripple method places it on an equal 
footing with spectral processing. A caveat is that, thus far, the method only has been applied 
to the steady state (i.e. periodic) response of cells. 

We find that response fields in AI tend to have characteristic shapes both spectrally and 
temporally. Specifically, AI cells are tuned to moving ripples, i.e., a cell responds well only 
to a small set of moving ripples around a particular spectral peak spacing and velocity. We 
find cortical cells with all center frequencies, all spectral symmetries, bandwidths, latencies and 
temporal impulse response symmetries. One way to interpret this result is that AI decomposes 
the input spectrum into different spectrally and temporally tuned channels. Another equivalent 
view is that a population of such cells, tuned around different moving ripple parameters, can 
effectively represent the input spectrum at multiple scales. For example, spectrally narrow cells 
will represent the fine features of the spectral profile, whereas broadly tuned cells represent the 
coarse outlines of the spectrum. Similarly, dynamically sluggish cells will respond to the slow 
changes in the spectrum, whereas fast cells respond to rapid onsets and transitions. In this 
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manner, AI is able to encode multiple different views of the same dynamic spectrum. Prom this, 
we conclude that the primary auditory cortex performs multi-dimensional, multi-scale wavelet 
transform of the auditory spectrum. 

Pitch is very important to the auditory system. The spectral ripple responses presented 
here do not have pitch, since they are synthetized with logarithmically spaced carrier tones. 
We have not yet examined unit responses to a ripple spectra with harmonically related carrier 
tones. Consequently, all our unit responses are due to the envelope or spectral profile of the 
broadband stimulus, and are not dependent on the carrier tones. It is quite possible that the 
pitch of a harmonic series of tones will affect the responses. It is also possible that sufficiently 
narrowly tuned cells might directly encode the harmonic spacing in a spectrum in a systematic 
manner to encode the pitch as was discussed in detail in Wang and Shamma^^. This is work in 
progress. 

The suggestion that cortical cells are linear might appear far-fetched given the non-linear 
response to pure tones, such as rate vs. intensity functions with threshold, saturation, and 
non- monotonic behavior (Brugge and Merzenich^^; Nelken et al.^). Nevertheless, we find 
that the non-linearity observed with broadband ripple spectra is substantially smaller than 
with tonal stimuli, when it comes to predicting the response of a cell to a combination of 
stimuli, knowing the response to individual ones. Purthermore, just as measuring linear systems 
response properties with tones, such as bandwidth, rate-level functions, tuning quality factor 
and other measures is considered meaningful, characteristics of the ripple responses prove useful, 
and relate to the properties measured with tones^^'-^^. Investigations currently under way in 
the Inferior Colliculus will shed light on the mechanisms that allow cells to exhibit a linear 
behavior in auditory cortex, so many synapses away from the auditory nerve. 
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